Inferring short tandem repeat variation from paired-end short reads

نویسندگان

  • Minh Duc Cao
  • Edward Tasker
  • Kai Willadsen
  • Michael Imelfort
  • Sailaja Vishwanathan
  • Sridevi Sureshkumar
  • Sureshkumar Balasubramanian
  • Mikael Bodén
چکیده

The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method's ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana, suggesting an ability to discover potential unstable repeats.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genetic Variation of Informative Short Tandem Repeat (STR) Loci in an Iranian Population

In the present study, genotyping of six short tandem repeat (STR) loci including CSF1PO, D16S539, F13A01, F13B, LPL and HPRTB was performed on genomic DNA from 127 unrelated individuals from the Iranian province of Isfahan. The results indicated that the allele and genotype distributions were in accordance with Hardy-Weinberg expectations. The observed heterozygosity (Ho), expected heterozygosi...

متن کامل

Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing

MOTIVATION Long expansions of short tandem repeats (STRs), i.e. DNA repeats of 2-6 nt, are associated with some genetic diseases. Cost-efficient high-throughput sequencing can quickly produce billions of short reads that would be useful for uncovering disease-associated STRs. However, enumerating STRs in short reads remains largely unexplored because of the difficulty in elucidating STRs much l...

متن کامل

Sequence analysis Discretized Gaussian Mixture for Genotyping of microsatellite loci containing homopolymer runs

Motivation: Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. Results: We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based...

متن کامل

Discretized Gaussian mixture for genotyping of microsatellite loci containing homopolymer runs

MOTIVATION Inferring lengths of inherited microsatellite alleles with single base pair resolution from short sequence reads is challenging due to several sources of noise caused by the repetitive nature of microsatellites and the technologies used to generate raw sequence data. RESULTS We have developed a program, GenoTan, using a discretized Gaussian mixture model combined with a rules-based...

متن کامل

Sequencing technologies and tools for short tandem repeat variation detection

Short tandem repeats are highly polymorphic and associated with a wide range of phenotypic variation, some of which cause neurodegenerative disease in humans. With advances in high-throughput sequencing technologies, there are novel opportunities to study genetic variation. While available sequencing technologies and bioinformatics tools provide options for mining high-throughput sequencing dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2014